perm filename DVIINF.TEX[TEX,DEK] blob sn#581079 filedate 1981-04-24 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00005 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input basic % DVIINF contains the definitive description of DVI files --drf
C00005 00003	When \TEX\ compiles a document, it produces an output file that contains
C00018 00004	% list of commands
C00036 00005	\hbox{\bf Appendix: Comparison between version 0 and version 1.}
C00039 ENDMK
C⊗;
\input basic % DVIINF contains the definitive description of DVI files --drf

% from dochdr:
\def\.#1{\hbox{\def\\{\char'134 }\:t#1}} % typewriter type for strings
\def\TEX{\hbox{\lowercase{\:a \uppercase{T}\hskip-2pt\lower1.94pt
	\hbox{\uppercase{E}}\hskip-2pt \uppercase{X}}}}
\font t=cmtt

%change the page size
%\hsize 12cm
%\vsize 17cm
%\eject

\hbox to size{Stanford University\hfil April 18, 1981}
\vskip 0.3cm
\hbox to size{\TEX\ Project\hfil David Fuchs}
\vskip 0.7cm
\ctrline{\bf The format of \TEX's DVI files.}
\vskip .8cm

\parskip 1pt plus 3pt

\def\dvi{{\:t .DVI}}
\def\le{≤}
\def\and{\mathbin{\char a\char n\char d}}
\def\or{\mathbin{\char o\char r}}
\def\no{\penalty 999\ }
\def\pg{{\bf page}}
\def\pst{{\bf postamble}}
\def\cpotp{{\bf current position on the page}}
\def\cf{{\bf current font}}
\def\ptr{{\bf pointer}}
\def\parm{{\bf parameter}}
\def\cmd{{\bf command}}
\def\ppp{{\bf previous page pointer}}
\def\hc{{\bf horizontal coordinate}}
\def\vc{{\bf vertical coordinate}}
\def\wa{{\bf w-amount}}
\def\xa{{\bf x-amount}}
\def\ya{{\bf y-amount}}
\def\za{{\bf z-amount}}
\def\wxyandza{{\bf w-\rm, \bf x-\rm, \bf y-\rm, and \bf z-amount\rm s}}
\def\zyxandwa{{\bf z-\rm, \bf y-\rm, \bf x-\rm, and \bf w-amount\rm s}}
\def\fontdef{{\bf font definition}}
\def\fontnam{{\bf font name}}
\def\fontnum{{\bf font number}}
\def\fontchk{{\bf font checksum}}
\def\fontmag{{\bf font magnification}}

When \TEX\ compiles a document, it produces an output file that contains
specifications of how \TEX\ has decided the formatted text should appear
in hard copy.  These output files are known as `\dvi' files, which stands
for `device independent'.  For instance, running \TEX\ and telling it to
\.{\\input dviinf} will cause \TEX\ to look for a file called {\:t DVIINF.TEX},
read it, and produce an output file called {\:t DVIINF.DVI}, which is a \dvi\ file.
This document describes the format of \dvi\ files in detail, giving all the
specifications along with examples.

A \dvi\ file contains information about where characters go on pages. The
format is such that there are those who say that almost any reasonable
device can be driven by a program that takes \dvi\ files as input. In
particular, a \dvi\ file can be printed on the Xerox Dover, Xerox Graphics
Printer\no (XGP), Varian, Versatec, Canon and Alphatype at the Stanford CS
Dept., depending on what spooler it is passed to.

The \dvi\ file is a stream of 8-bit bytes, packed in computer words
high-order byte first.  If the computer word length is not evenly
divisible by\no 8, then the extra bits at the low-order end of each word
will be unused.  The first byte in a \dvi\ file is byte number zero, the
next is number one, etc.  For example, on Stanford's 36-bit word machines,
byte number\no 0 is in the highest order eight bits of the first word in a
\dvi\ file, while byte number\no 7 is in the twelfth through fifth least
significant bits of the second word in the file; and the least significant
four bits in every word are zero.

A \dvi\ file is actually a series of \cmd s.  A \cmd\ consists of one byte
containing the \cmd's unique number, followed by a number (possibly zero)
of \parm s to the \cmd.  A given \cmd\ always has the same number of \parm
s.  These \parm s may take from one to four bytes each, but a given \parm\ 
of a given \cmd\ always takes the same number of bytes.  Some \parm s may
sometimes be negative, in which case two's complement representation is
used.  The complete list of \cmd s, with a description of all the \dvi\ \cmd
s and their \parm s, is below.  The reader is encouraged to refer to the
\cmd\ list while reading the various examples in this document.

In the \cmd\ descriptions, a lower case letter with a [bracketed] number
following it means that the \cmd\ has a \parm\ that is that number of
bytes long.  An X2 \cmd, for instance, is 3 bytes long, the first byte of
which has the decimal value 144, the second and third of which give the
distance to move to the right.  If the second byte $=S$ and the third
$=T$, then the distance to move is $2↑8S+T$ (but if the high order bit of
$S$ is a one, then the distance to move is $2↑8S+T-2↑{16}$,
considering $S$ and $T$ as being in the range [0..255]).

The \dvi\ file contains a number of \pg s followed by a \pst.  A \pg\ con\-
sists of a BOP \cmd, followed by lots of other \cmd s that tell where the
characters on the page go, followed by an EOP \cmd.  Each EOP \cmd\ is
immediately followed by another BOP \cmd, or by the PST \cmd, which means
that there are no more \pg s in the file, and the remaining bytes in the
\dvi\ file are the \pst.  Remember that \TEX\ really doesn't have an
official knowledge of \pg\ numbers (although it does print the value of
\.{\\count0} on your terminal as it outputs each \pg\ on the assumption
that some meaningful number is there), so the only thing that can be said
about the ordering of pages in a \dvi\ file is:  The order in which \pg s
come in a \dvi\ file is the same order in which \TEX\ constructed them,
which is the same order in which the \TEX\ user specified them.  Any blank
or nonexistent \pg\ from a \TEX\ job might not be in the \dvi\ file at all.
If we consider the \pg\ number to be the value of \.{\\count0}, then the
\pg\ following \pg\ number\no 34 in a \dvi\ file might well be
\pg\ number\no $-5$.

Some \parm s of \dvi\ \cmd s are \ptr s.  A \ptr\ is simply a byte number as
discussed above.  A \ptr\ itself is 4 bytes long.  For example, a BOP
\cmd's last \parm\ (\.p[4]) is the BOP's \ppp.  This parameter is the
number of the byte in which the previous \pg's BOP command begins.  In
particular, the {\sl second} \pg's BOP \cmd's \ppp\ \parm\ (\.{p}[4]) is
always zero, since the first \pg's BOP is always in byte zero in a \dvi\ file.
If the first \pg\ in a \dvi\ file had only a BOP and EOP \cmd, then
the {\sl third} \pg's BOP's \ppp\ would be\no 46, since the first \pg's BOP
\cmd\ takes bytes zero through\no 44, the first \pg's EOP is byte\no 45, so the
{\sl second} \pg's BOP is in byte\no 46.

When a \dvi-reading program reads the \cmd s for a \pg, it should keep
track of the \cf.  This can be done with a single integer variable, the
value of which will always lie in the range\no [0..$2↑{32}-2$].
The value of the \cf\ is changed only by FONT
and FONTNUM \cmd s.  Whenever a \cmd\ occurs in the \dvi\ file that causes a
character to be set on the \pg, the character is implicitly from the \cf.

Likewise, the program should keep track of the \cpotp.  The \cpotp\ is like
a cursor on the \pg; whenever a character or rule is set, it gets put at
the \cpotp.  The \cpotp\ is just two numbers---which are called \hc\ and
\vc.  Moving to the right on a \pg\ is represented by an increase in \hc,
while moving down is an increase in \vc.  The upper-left-hand corner of
the \pg\ is $\hbox{\hc}=\hbox{\vc}=0$ (i.e., our system is slightly
non-cartesian).  Both {\bf coordinates} are given in rsu's
(ridiculously small units), where  1\no $\hbox{rsu}=10↑{-7}\hbox{meter}$.  This
is so that accumulated errors will be insignificant even in the worst
imaginable case (a ``box'' many feet long).  The \cpotp\ is moved about by
the commands W0, W2, W3, W4, X0, X2, X3, X4, Y0, Y2, Y3, Y4, Z0, Z2, Z3
and Z4:  The \vc\ is changed by Y and Z \cmd s, while the \hc\ is changed
by W and X \cmd s. (The value of \hc\ can also change as a side effect of
setting a character or rule (VERTCHAR and VERTRULE \cmd s)---the \cpotp\ 
moves right the natural width of the character or rule set. The POP \cmd\ 
may also change \cpotp.)

So, whoever or whatever reads a \dvi\ file might have three variables, $F$, $H$
and $V$, to keep track of the \cf\ and the \cpotp.  Four more variables
are also called for:  \wa, \xa, \ya, and \za.  These variables hold not
locations, but distances (in rsu's).  The {\bf amount} variables are
used in \dvi\ files to move the \cpotp\ around:  The \cmd s X0 and W0 add
\xa\ and \wa\ to \hc, respectively, while Y0 and Z0 add \ya\ or \za\ to
\vc, respectively.  There are also a number of \cmd s that change the
value of \wa, \xa, \ya\ or \za\ (W2, W3, W4, X2, X3, X4, Y2, Y3, Y4, Z2,
Z3 and Z4; these commands also change \hc\ or \vc).  Actually, the \dvi-reading
program must have a stack that can hold \hc s and \vc s, as well
as \wxyandza.  These six values always get pushed and popped together, and
a reasonable maximum stack depth might be about\no 200 (times six, since
six items get pushed at once).  As each \pg\ starts, a \dvi\ reading program
should set the {\bf amount} variables to zero.  The stack should be empty.
The initial value of $F$ doesn't matter, since {\sl every} \pg\ of a \dvi\ file
must have a FONT or FONTNUM \cmd\ before any \cmd\ that will set a
character (the HORZCHAR and VERTCHAR \cmd s).  Note that $F$ is {\sl not}
pushed and popped.

A program called \.{DVITYP} is available that takes any \dvi\ file and prints
a readable description of its contents, together with error messages if the
file is not in the correct format.

\vfill\eject
% list of commands
\def\strut{\lower 3.5pt\vbox to 12pt{}}
\def\cmddescr#1#2
	#3{
	\save0\vbox{
		\hbox{\hbox to 3cm{#1\hfil}\hbox{#2\strut}}
		\moveright 3cm\hbox par 9cm{\strut#3}
		}
	\vfil\penalty -100\vfilneg
	\vskip .5cm
	\box0
	}
\cmddescr{Command Name}{Command Bytes}
	{Description}
\cmddescr{VERTCHAR0}{0}
	{Set character number 0 from the \cf\ such that
	its reference point is at the \cpotp, and then
	increment \hc\ by the character's width.}
\cmddescr{VERTCHAR1}{1}
	{Set character number 1, etc.}
\hbox{\hbox to 3cm{\hss\vdots\hss}\vdots}
\cmddescr{VERTCHAR127}{127}
	{Set character number 127, etc.}
\cmddescr{NOP}{128}
	{No-op, do nothing, ignore.  
	Note that NOPs come {\sl between} \cmd s, they may not come
	between a \cmd\ and its \parm s, or between two \parm s.}
\cmddescr{BOP}{129 \.{c0}[4] \.{c1}[4] $\ldots$ \.{c9}[4] \.{p}[4]}
	{Beginning of \pg.  The \parm\ \.{p} is a \ptr\ to the BOP
	\cmd\ of the {\sl previous} \pg\ in the \dvi\ file 
	(where the {\sl first} BOP in
	a \dvi\ file has a \.{p} of $-1$, by convention).
	The ten \.{c}'s hold the values of \TEX's ten \.{\\count}ers at the time
	this \pg\ was output.}
\cmddescr{EOP}{130}
	{The end of all \cmd s for the \pg\ has been reached. The
	number of PUSH \cmd s on this \pg\ should equal the number of
	POPs.}
\cmddescr{PUSH}{132}
	{Push the current values of \hc\ and \vc, and the current
	\wxyandza\ onto the stack, but don't alter them (so an X0
	after a PUSH will get to the same spot that it would have
	had it had been given just before the PUSH).}
\cmddescr{POP}{133}
	{Pop the \zyxandwa, and \vc\ and \hc\ off the stack.  At no point
	in a \dvi\ file will there have been more POPs than PUSHes.}
\cmddescr{HORZRULE}{135 \.{h}[4] \.{w}[4]}
	{Typeset a rule of height \.{h} and width \.{w}, with its bottom
	left corner at the \cpotp.
	If either $\.{h}\le0$ or $\.{w}\le0$, no rule should be set.}
\cmddescr{VERTRULE}{134 \.{h}[4] \.{w}[4]}
	{Same as HORZRULE, but also increment \hc\ by
	\.{w} when done (even if $\.h\le0$ or $\.w\le0$).}
\cmddescr{HORZCHAR}{136 \.{c}[1]}
	{Set character \.{c} just as if we'd gotten the VERTCHAR\.{c}
	\cmd, but don't change the \cpotp.
	Note that \.{c} must be in the range\no [0..127].}
\cmddescr{FONT}{137 \.{f}[4]}
	{Set \cf\ to \.{f}.  Note that
	this \cmd\ is not currently used by \TEX---it is only needed
	if \.{f} is greater than 63, because of the FONTNUM \cmd s below.
	Large font numbers are intended for use with oriental alphabets and
	for (possibly large) illustrations that are to appear in a
	document; the maximum legal number is $2↑{32}-2$.}
\cmddescr{X2}{144 \.{m}[2]}
	{Move right \.{m} rsu's by adding \.{m} to \hc, and
	put \.m into \xa.  Note that \.m is in 2's
	complement, so this could actually be a move to the left.}
\cmddescr{X3}{143 \.{m}[3]}
	{Same as X2 (but has a 3 byte long \.m \parm).}
\cmddescr{X4}{142 \.{m}[4]}
	{Same as X2 (but has a 4 byte long \.m \parm).}
\cmddescr{X0}{145}
	{Move right \xa\ (which can be negative, etc).}
\cmddescr{W2}{140 \.{m}[2]}
	{The same as the X2 \cmd\ (i.e., alters \hc), but
	alter \wa\ rather than \xa, so that doing a W0
	\cmd\ can have different results than doing an X0 \cmd.}
\cmddescr{W3}{139 \.{m}[3]}
	{As above.}
\cmddescr{W4}{138 \.{m}[4]}
	{As above.}
\cmddescr{W0}{141}
	{Move right \wa.}
\cmddescr{Y2}{148 \.{n}[2]}
	{Same idea, but now it's ``down'' rather than ``right'', so
	\vc\ changes, as does \ya.}
\cmddescr{Y3}{147 \.{n}[3]}
	{As above.}
\cmddescr{Y4}{146 \.{n}[4]}
	{As above.}
\cmddescr{Y0}{149}
	{Guess.}
\cmddescr{Z2}{152 \.{m}[2]}
	{Another downer.  Affects \vc\ and \za.}
\cmddescr{Z3}{151 \.{m}[3]}
	{}
\cmddescr{Z4}{150 \.{m}[4]}
	{}
\cmddescr{Z0}{153}
	{Guess again.}
\cmddescr{FONTNUM0}{154}
	{Set \cf\ to 0.}
\cmddescr{FONTNUM1}{155}
	{Set \cf\ to 1.}
\hbox{\hbox to 3cm{\hss\vdots\hss}\vdots}
\cmddescr{FONTNUM63}{217}
	{Set \cf\ to 63.}
\cmddescr{PST}{131 \.p[4] \.n[4] \.d[4] \.m[4] \.h[4] \.w[4]\linebreak
	Fontdef Fontdef ... Fontdef \.{-1}[4] \.q[4] \.i[1] \.{223}[?]}
	{The \pst\ starts here. See below for the full explanation
	of the \parm s of the \pst.}
Commands 218--255 are currently undefined and will not be output by \TEX.
\par

The PST \cmd, which is always the last command in a \dvi\ file, is somewhat
special.  The \parm\ \.p is a \ptr\ to the BOP of the final \pg\ in the
\dvi\ file.
The \parm s \.n and \.d are the numerator and denominator of a fraction
by which all the dimensions in the \dvi\ file should be multiplied by to
get rsu's (\TEX\ always outputs a 1 for each of these values, they are
included in \dvi\ format to allow other text systems to conveniently output
\dvi\ files).
The \parm\ \.m is the overall magnification requested by par12 in
the \TEX\ job (par12 is unitless, and is 1000 times the desired
magnification).
Next comes \.h and \.w, which are the height of the tallest \pg,
and the width of the widest (both in rsu's).

Next in the postamble come the \fontdef s, one for each font used
in the job (i.e., each FONT and FONTNUM \cmd\ in a \dvi\ file must refer to
a \fontnum\ that has a \fontdef).
The format of a \fontdef\ can be considered to be:
$$\hbox{\.{fnum}[4] \.{fchk}[4] \.{fmag}[4] \.{fnamlen}[1] \.{fnam}[\.{fnamlen}]}$$
The \fontnum\ is held in \.{fnum}.
The \fontchk\ (from the font's \.{TFM} file) is in \.{fchk}.
The \parm\ \.{fmag} holds the \fontmag\ (1000
times the `at size' of the font divided by its `design size' (or
just 1000 if there was no `at' specification for the font)).
Next comes the byte \.{fnamlen},
which is the number of characters in the \fontnam, followed by the
the \fontnam, one ascii character per byte (right justified). Note that
the font name includes a directory only if the font is not in the standard
default library directory. From the definitions of the \parm s of the PST
command, note that the end of the \fontdef s is marked by a
\fontnum\ of \.{-1} (which is {\sl not} a legal font number).  
The four bytes following this phony \fontnum\ constitute the \parm\ 
\.q, which is a \ptr\ to
the PST \cmd\ (i.e.,  the beginning of the postamble). Next is
a single byte \parm\ \.i (called the ID byte).
Currently, the ID byte should always
have a value of 1; it will be changed to 2 on the next incompatible
release of \dvi\ format in 1990.
Finally, there are
some number (at least 4) of bytes whose value is 223 (base ten = '337
octal).

The idea of the \.q \ptr\ at the end of the \pst\ is that a \dvi\ reading
program can start at the end of the \dvi\ file, skipping backwards over the
223's, until it finds the ID byte.  Then it can back up 4 bytes, read
\.q, and then do a random seek to that byte number within the \dvi\ file.
Now the \pst\ can be read from start to finish, while storing away the names
and magnifications of all the fonts.  Now the program can jump to the start
of the \dvi\ file and read it sequentially.  The reason for reading the \pst\ 
first is that to figure where the characters on a page go, the \dvi\ reading
program must know the widths of the characters (see the VERTCHAR commands'
description above).  To find the widths, the \dvi\ reader must know the names
of the fonts so it can get their widths from a \.{TFM} or \.{VNT} (or some other
kind of font) file.  But \TEX\ can't put out all the font names until the end
of the \dvi\ file because new fonts can appear anywhere in the \TEX\ job.  If
\fontdef s were scattered throughout the \dvi\ file, then a spooler that read
\dvi\ files would have to read all the pages of the \dvi\ file, even if the
user only wanted the last page printed.  The decision to put the \fontdef s
in the \pst\ was based on these considerations, and the fact that just
about any reasonable systems language allows random access.  Unfortunately,
standard PASCAL does not offer this feature.  If it is absolutely 
necessary for a \dvi\ reading program to be written in standard PASCAL,
then it either must make two passes over the \dvi\ file, or \TEX\ must be
doctored to output {\sl two} files: the regular \dvi\ file, plus a PST
file, which contains only the \pst.  So far, there have been no reports
of any installation of \TEX\ that required this kind of kludge.

A few words on magnification:  If you have a \TEX\ document that does not
mention any `true' dimensions, then if you change just its \.{\\magnify}
statement, the \dvi\ file produced by \TEX\ will change in just {\sl one}
place---the word in the postamble that records the requested magnification.
The idea is that any spooler that reads the \dvi\ file will multiply {\sl all}
dimensions in the \dvi\ file by the magnification, thus the default
magnification in the \dvi\ file may be easily overridden at spooling time.
So, if the document specifies \.{\\magnify$\{1200\}$}, a \.{\\vskip 34cm}
will be recorded in the \dvi\ file as $.34\times10↑7$ rsu's of white space,
but the spooler will multipy this by $1.2$, making $40.8$ centimeters of
white space on output.  If the user tells the spooler to use a
magnification of $1000$ rather than the $1200$ in the \dvi\ file, then the
output will have 34cm of white space.  If a dimension in the document is
specified as being `true', then \TEX\ divides the
distance specified by the prevailing magnification, so that when a spooler
looks at the \dvi\ file and multiplies by the magnification, it gets back
the original distance.  So, if we \.{\\vskip 24truecm} while
the magnification is 1200, \TEX\ puts out \dvi\ \cmd s that specifies 20
centimeters of white space.  An output spooler that reads this \dvi\ file then
puts $20\times1.2=24$cm of white space on its output.  Of course, `true'
dimensions will come out `false' if the spooler is told to override the
magnification.

Font magnification goes one step further.  Assume for a moment that
the overall magnification is 1000.  Now, if a \TEX\ job specifies
\.{\\font A=CMR10 at 15pt}, say, that font's magnification is recorded as
1500 in its \fontdef.  When a spooler reads this \dvi\ file, it will try
to use the file \.{CMR10.150VNT} (or \.{CMR10.150ANT}, depending on the device),
which is just like \.{CMR10.100VNT}, but the dimensions of all its characters
were multiplied by $1.5$ before they were digitized.
An uppercase `W' in CMR10 is 10pt wide, but
CMR10 at 15pt has a 15pt wide `W', so after VERTCHAR87 is seen, \hc\ is
increased by ${(15\hbox{pt})}\times(254000\hbox{rsu}/{72.27\hbox{pt}})$.
Overall magnification is taken into account after all other calculations;
for example, at magnification 1200 the font \.{CMR10.180VNT} would be used.
Note that if the user had asked for \.{cmr10 at 15truept}, the factors
would cancel out so that \.{CMR10.150VNT} would be the font chosen regardless
of magnification. The magnification factor is given times 100 in the font
file name so that roundoff error due to several multiplications will not
affect the search for a font with characters of the right size. This
convention about font file names is merely a suggestion, of course, it is
not part of the \dvi\ format per se.

\vfill\eject
\hbox{\bf Appendix: Comparison between version 0 and version 1.}
\vskip .75cm
Note that \dvi\ files have an ID byte at the end of the \pst,
which tells what version they are.  The changes since version 0 are:

DVI files now use the {\sl upper} bits in a word on machines whose word size
   isn't evenly divisible by 8.
The BOP command has {\sl ten}  \.{\\count}er parameters.
The size of rsu's has changed to be $10↑{-7}$meter.
The postamble has changed to include overall magnification
   as well as a fraction that allows use of non-rsu dimensions.
Font checksum and magnification are new, as is the convention about
   default directory name.
Font descriptions in the postamble give the length of font names rather
   than delimiting them with a quoting character.
The old zero ID byte is now a one.

\vskip 1.75cm
\hbox{\bf Some ideas for version 2.}
\vskip .75cm
Although 1990 is still a ways off, we are currently expecting that
version 2 of \dvi\ files will differ in the following ways:

The ID byte will be 2. 
The \.q bytes of the postamble will be preceded by further information:
First `\.s[2]' where \.s is the maximum stack depth (excess of pushes over pops)
	needed to process this file. 
Then `\.t[2]' where \.t is the total number of pages (BOP's) in the file.

\vfill\end